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Calculate polarity at axis #10 



Polarity = 1 - 0/5 = 1.0 




Calculate polarity at axis #1 1 



Polarity = 1 - 3/5 = 0.4 
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Calculate polarity at axis #1 2 



Polarity = 1 - 0/5 = 1 .0 
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Calculate polarity at axis #13 
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Polarity - 1 - 3/5 = 0.4 
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Calculate polarity at axes #14-21 
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Polarity = 1 - 1/5 = 0.8 
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A- HIV2 complete genome 
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B. Random sequence 



Extended regions of increased 
polarity. The peaks represent 
regions of 500-600 nucleotides, 
where values of polarity are 
concentrated which deviate 
from the 0.75 expected. 
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I . Enter input sequence of desired length 



I 



2. 


Enter window size 






i 




3. 


Axis chosen 3 






i 




4. 


Calculate and output polarity value b 








5. 


Move to next axis 


<- 



i . - 

6 Calculate and output polarity value around new axis, then. . .° 

i " 

7 Output ordered list of polarity values 

8. Graph these values d 

+ 

9. Statistical analysis of observed vs. predicted 

10. Identify regions of extended polarity 



a Starting at position = (2*window of symmetry) 
b [l-(S/W)] 

c Up to and including axis position = [2*length - (2*window size)] 

d Can use a moving average of values (with number of values averaged and increment of 
moving being variable) to smooth curve 
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The algorithm was impl emen ted in PERL programing language. 
PERL variable-names a^Miction-names are in boldface. 



import input sequence as a string variable $input_seq ) 




prompt user for ($win_sym ) length 



M 

m 

W 

1 



I 



cut a length of (2*$win_sym ) bases from the 5' end of$input_seq and assign this 
substring into $win_seq 



perform 
(2*win_sym) 
iterations 



chop (cut last character off a string and return it$win_seq 



unshift chopped character (prepend it to the front of an array, moving all elements one step to 
the right) into an initially empty indexed array @trgt„fwd ) 



translate $win_seq so that every G or A or T or C of the original string 
become C or T or A or G respectively of the translated string 



perform 
(2*win_sym) 
iterations 



r 



chop the translated $win_seq 



push chopped character (stack it onto the end of an array, without moving the other elements) 
onto an initially empty indexed array @trgt_revcomp ) 



assign $match_count = 0 



assign index variable$i = 1 



perform 
(win_sym] 
iterations 



compare between the $i th elements of arrays@trgt_fwd and @trgt_revcomp ; if equal then 
increase $match_count by 1 , else nothing. 



increase $i by 1 



calculate $asym_count = 1 - ($match_count / winsym) 



push $asym_count onto initially empty indexed array @axis Jist 



►while $input_seq is not empty 



cut one base off of the 5' end of input_seq and assign it into$basefeed 



translate $basefeed GATC->CTAG respectively and assign the 
translated character into$basefeed_comp 



unshift $basefeed_comp (prepend it to the front of an array, moving all elements one 
step to the right) into array @trgt__revcomp 



pop (remove last element of an array@trgt_revcomp 



assign $match_count = 0 



assign index variable $i = 1 



perform 
(win_sym] 
iterations 



compare between the $i th elements of arrays@trgt_fwd and @trgt_revcomp ; if equal then 
increase $match_count by 1 , else nothing. . 



increase $i by 1 



calculate $asym_count = 1 - ($match_count / win_sym) 



push $asym__count onto @axisjist~| 



shift (remove first element of an array and move all elements one step 
leftward) @trgt_fwd 



push $basefeed onto array@trgt_fwd ] 



assign $match_count = 0 



assign index variable $i = 1 



^compare between the $i th elements of arrays@trgt_fwd and@trgt_revcomp ; if equal then 

perform increase $match _count by 1 , else nothing. 

(win_sym]j 

iterations 



increase $i by 1 



calculate $asym_count = 1 - ($match_count / win sym) | 



push $asym_count onto @axisjist 



save @axis_list to file 
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self mirror 


Purine pyrimidine dyad 
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Figure 5A 



Figure 5B 
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i 1 1 

W=window 



1 0 1 Enter input sequence of desired length 

r 

1 02 Enter window size 

i 

103 Axis chosen 3 

I . 

104 Count DPT frequencies, calculate and output DPT residuals and chi square value 

+ 

105 and 105* Move to next axis ^ 

I 

106 and 106* Count DPT frequencies, calculate and output DPT 

residuals and chi square value around new axis, then. . . 

i 

107 Save to file: ordered arrays of DPT frequencies, DPT residuals and chi square values 

108 Further statistical analysis of observed vs. predicted 

1 09 Graph values 0 



I 

1 10 Identify functional elements 



a Starting at axis position = (2* window size) 

b Up to and including axis position = [2*length - (2*window size)] 

c Values include DPT frequencies, statistical measures including residuals and % 
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The algorythm v«s implemented in PERL programing language. 
P ER L variabl e-nam eaandfunction-namesareinboldf&ce. 




import input sequence as a string variable [tin put_.se q] 



prompt us erfor($yin_sym) length" 



ut a length of (2*tvin_sym) bases Irom the 5' end of $input_seq 
u bstrin g int o tvin_3eq 




perform 
(2*v«4n_sym) 
iterations 



1 1 |unsl 

Fheri 



hop [cut last character off a string and return it) Tvin_seq 



] 



hift chopped charact er(prepend it to the front of an array, moving all elements one stepto 
g ht) int o an initially em pty in d exe d array f @trgt_f vd) 



bamlat e ivin_s e q s o t hat every G or A orT orC of the ori ginal strin g 
[become C orT or A orG res p ectively of t h e translat e d strin g 



perform 
(2*\*in_sym) 
iterations 



P ^chop the translated $vin_seq 



push chopped character (stack it onto the end of an army, without moving the other elements) 
onto an initially empty in d exed array [ fftrgt _revc om p) 



104 



assign tqg_count = 0, $ga_count = 0... one such variable for each of the 16 DPTs. 



pert 

Nn_ - 
iterations 



assign in d ex variabl e Si = 1 



Mhe^i th elements of arrays ®irql Jvdand m r gt _re vc om p er e G an dG, respectively, then 
rform f* ncrease $g g„count by 1 . Repeat this conditional operation for each of the 1 6 possible opts. 



increase %i by 1| 



each of the 16 possible DPTs 



|pushiggjres onto ffgg„re3. Repeal this step for each of the 16 possible DPTs. | 



Calculate chi square: 4chi_»Q - U*Q0_fe 3 -2) I U1M61 xjwn_symj + 

tt ga_res " 2) / (t 1 M 6J x Svin_3ym) + 
it gt_re3 "2J 1 1( H I S) x *vin_sym) + 
(*gc_res **2}l (( H 1 6} x $win_aym] + 
[*ag__re3 **2) I ((1M6] x *wn_sym) + 
(*aa_res "2) l(( W 1 6] x *win_*ym) + 
($at„res **2) Ml IM 6) x $vin_3ym) + 
(*ac_res "21 I (( IM 6) x *win_syni) + 
($tg_res "21 Ml HI 6) x tvin_aym) + 
($ta_re3 **2) I ((1MB) x$vin_3ym] + 
($U_res "2) M( IM 6) x tvin_3ym) * 
(*tc„res "21 t ((1M6) x twin_sym) + 
($cg_res **2) I ((1M6) x$win_3ym) + 
{$ca_res "21 M( IM 6) x $vin_3ym) + 
($ct_res 2) f (f 1 M 6) x tvin_3ym] •* 
f$cc res "2) I ((IMS] x$win_3yro)) 



[push >chi_sq onto f>chi_3q. 
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airs- 

r'i :! 



t:s i: 
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perform 
Aerations 



r 



1061 



-P> fwhil e tin p ut _3 e q i3 n ot em pty] 

IcuTonebase off of the 5' end ofinput_3eq and assign it into tbasefecd] 



{translate $ baser eed GATC- >CTAG respectively and assign the 
translated character int o$basefeed_comp 



|unshifttbasefeed_comp(prependittothefront of an array, moving all elements one 
;tep to the right] into array @trBt_revcomp 



i 



|pop [remove last element of an array} etrgt_revconi| 



|a33igntgg_courU « 0, $ga_count~ 0... one such variable for each of the l6DPTs.| 
lassiqnindexvariableTi - i| 



dfiheji thelei 
ncreasetgg. 
ncre^egby 



elements of arrays ©trgtjvdand @trgt_revcomp are G and G, rea p ettively, then 
count by i. Repeat this conditional operation for each of the 16 possible DPTs. 



sale ulat e residusb: t gg_res = f gg_count -«1M 6) x*vin_sym). Repeat 
this calculation for each of the 1 6 possible OPTs, 



pU3tiSgo_res onto ^gg_res. Repeat this step for each of the 16 possible DPTs. | 



Calculate chi square: 



ichi sq « U$gg_res "2} I ((1M6) x»vin_sym) + 
(* ga„res 2J f (f IM 61 x twin_sym) + 
tt glares ♦^l t ((1/16) x*win_3yin) + 
($gc_re3 "211 tt IM SI x$vin_sym) + 
($ag_res **2) / (( HI 61 x tvin_3ym) + 
[Saa_res •♦2) / tt H 1 6) x $vin_syml + 
($at„r es **2}M(H16}x $win„3ym) + 
($ac_re» ♦*2) I (( H 1 6) x *vin_3ym) + 
(*tg„rc3 **2} Hi IM 6) x $vin_3ym) + 
{$ta_res **2J I (( 1/16) x $vin_3ym) + 
($tt_re3 ••2) / (( HI 61 x tvin.sym) + 
($tc_re3 2} M( H 1 6) x $vin_3vm) + 
[%c g_re3 ** 21 / (( 1 M 6J x twn_symj + 
($ca„res **21 1 (( 11 161 x$win_3ym) + 
t$ct„res ♦* 21 1 (t 11 1 6) x twin_syml + 
f »cc res "21 M( H 1 61 x tvin^sym)) 



push tchusq onto @chi_sq. 



6 



hift (remove first element of an array and move all elements one step 
eftvard) ©trgt_f vd 



push Tbasef eed onto array etrgt_fvd 



assign tgg„count= tga„co«nt= 0... one 3uch variable for each of the 16 DPTs. | 



laasign in d ex variabl e Si = 1 



perform 
>in_sym 
tterations 



n 



f theii th elements of airaya ^trgt Jvd and @trgt„revc omp are G and G, res p ectively, th en I 



ncrease $gfi„count by 1 .Repeat this conditional operation for each of the 1 6 possible DPTs. 



increase Ti byl 



utat e resi d uab : * g g„res = * g g_c o unt - (( ! M 6) x $ vin„syml . R e p eat t his 

alculation f or each of the 16 possibieDPTs. 



i 



push$gg„res onto ^gg_rea. Repeat this step for each of the 16 possibieDPTs. | 



Calculate chi square: *chi_sq = ((tgg_res "21 1 ((1M6] x*win_symi 
^ ($ga_re3"2)M(H161x$win_3ym) + 

it gt„re3 •* 2) f U 11 1 61 x *vin_syml + 
(*gc„res "21 M(H1 6) x **in_sym) + 
($ag_res "21 Ml H 1 6) x $win„sym) + 
($aa_re» "21 M( HI 6) x $vin_3ym) + 
(*at„res "2) I ((H16) x tvinjjvml + 
($ac_res **2) Mt H 1 61 x $wi_sym) + 
(ttg_res "2J / ((1M61 x*win_sym) + 
($ta_res " 21 1 [{ If 1 6) x $win_3vml + 
($U__res "21 M( H 1 61 x $vin_sym) + 
($tc„res ** 2} Mt H 1 81 x twin_svml + 
(*cg„res "2) / (( IM 6) x *wn_»yml + 
(tca_res *+2) f {[ If 1 6) x tvin_syml + 
($ct„res " 21 M( 1 M 6) x *win_3yrol + 
f $cc^res "21 1 tt HI 61 x tvin_3ym)) 



107 



push ^chi_3q onto ^chi_aq. 



[?ave#gg_re3 to file. Repeat this step for each of the 16 possibieDPTs. 



jave @chi„sq to file 
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